---
title: Defining redundant features
description: How does DataRobot define similarity in features and call them redundant?
---

# Defining redundant features {: #defining-redundant-features }

<span style="color:red;font-size: 1rem"> `Robot 1`</span>

**What makes a feature redundant?**

The [docs](feature-impact#remove-redundant-features-automl) say:

> If two features change predictions in a similar way, DataRobot recognizes them as correlated and identifies the feature with lower feature impact as redundant

 How do we quantify or measure "similar way"?

<span style="color:red;font-size: 1rem"> `Robot 2`</span>

If two features are highly correlated, the prediction difference (prediction before feature shuffle / prediction after feature shuffle) **of the two features should also be correlated**. The prediction difference can be used to evaluate pairwise feature correlation. For example, two highly correlated features are first selected. The feature with lower feature impact is identified as the redundant feature.

<span style="color:red;font-size: 1rem"> `Robot 1`</span>

Do we consider two features redundant when their prediction differences is the same/between `-x%` and `+x%`?

<span style="color:red;font-size: 1rem"> `Robot 2`</span>

We look at the correlation coefficient between the prediction differences and if it's above a certain threshold, we call the less important one (according to the models' feature impact) redundant.

<span style="color:red;font-size: 1rem"> `Robot 2`</span>

Specifically:

1. Calculate prediction difference before and after feature shuffle:

	`(pred_diff[i] = pred_before[i] - pred_after[i])`

2. Calculate pairwise feature correlation (top 50 features, according to model feature impact) based on `pred_diff`.

3. Identify redundant features (high correlation based on our threshold) then test that removal does not affect accuracy significantly.

<span style="color:red;font-size: 1rem"> `Robot 1`</span>

Thank you, Robot 2! Super helpful.

